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£h 1 Introduction. 
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Abstract - With the random matrix theory, we study the spatial structure of the Chinese stock 
market, American stock market and global market indices. After taking into account the signs of 
the components in the eigenvectors of the cross-correlation matrix, we detect the subsector struc- 
ture of the financial systems. The positive and negative subsectors are anti-correlated each other 
in the corresponding eigenmode. The subsector structure is strong in the Chinese stock market, 
while somewhat weaker in the American stock market and global market indices. Characteristics 
of the subsector structures in different markets are revealed. 



In recent years, much attention of 



5-1— i physicists has been attracted to the financial dynamics, 
Q4vhich exhibits various collective behaviors [IHS] . Statisti- 
cal properties of price fluctuations and cross-correlations 



between individual stocks arc of great interest, not only 
^for quantitatively unveiling the complex structure of the 
qq financial systems, but also practically for the asset alloca- 
i— ( tion and portfolio risk estimation [TU1 - [T?| . The probability 
^d" distribution of price returns usually exhibits a power-law 

tail, and represents the robust characteristics in stock mar- 
i-H kets [131115] , while the higher-order time correlations and 

interactions between stocks are less universal [BTfTlfTB] . In 
^| some cases, price returns may also show a Poisson-like dis- 
Tl tribution [niCEB]. 

_ It is an important and challenging topic to explore the 
'spatial' structure in financial systems. For example, the 
j_i hierarchical structure of stock markets has been investi- 

5^ gated through the minimal spanning tree method and its 
variants [T9l423j . With the random matrix theory (RMT), 
business sectors and topology communities may be iden- 
tified |16U24l ErJ]. The RMT method was firstly developed 
in the complex quantum systems where the interactions 
between subunits are unknown [57J[2S]. The structure of 
business sectors have been examined for mature markets 
such as the New York Stock Exchange (NYSE) and the 
Korean Stock Exchanges [24 ,25,29 {22], and also for some 
emerging markets such as the National Stock Exchange 
in India 16 . Very recently, the RMT method was ap- 
plied to identify the dominant eigenmodes in the indices 
of the industrial production |33j . In particular, one has in- 



vestigated the structure of interactions between stocks for 
the Chinese stock market based on the RMT method [B]. 
As an important emerging market, the Chinese market ex- 
hibits stronger cross-correlations than the mature ones. At 
the same time, the effect of the standard business sectors 
is weak in the Chinese market. Instead, unusual sectors 
such as ST and Blue-chip sectors are detected. 

In this paper, with the RMT method, we aim at fur- 
ther understanding of the spatial structure. Our observa- 
tion is that the components in an eigenvector of the cross- 
correlation matrix may show positive and negative signs. 
To the best of our knowledge, what roles the signs of the 
components play has not been explored. Our main find- 
ing is that the signs of the components in an eigenvector 
may classify a sector into two subsectors, which arc anti- 
correlated each other within this eigenmode. This goes 
beyond what one may gain with standard methods such 
as the minimal spanning tree and its variants in the anal- 
ysis of the 'spatial' structure in financial systems [H)l - l2"3"] . 

Methods and basics. — We have collected the daily 
data of 259 stocks traded in the Shanghai Stock Exchange 
(SSE) from Jan., 1997 to Nov., 2007, in total, 2633 days. 
The daily data of 259 stocks in the NYSE are from Jan., 
1990 to Dec, 2006, in total, 4286 days. Meanwhile, we 
have collected the daily data of a set of 66 financial in- 
dices, including 57 indices in stock markets and 9 treasury 
bond rates in US from Sep., 1997 to Oct., 2008, in total, 
2669 days. We name the 66 indices the global market in- 
dices (GMI). The dat a of the SSE are taken fro m 'Wind 
Financial Database' (http://www.wind.com.cn) and the 
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data of the NYSE and GMI are from Yahoo Finance' 
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flhttp://fi nance.yahoo.com] ) . 

We assume that the price is the same as the preceding 
day |34j . if the price of a stock is absence in a particular 
day. It has been pointed out that the missing data do not 
result in artifacts |16j . All markets concerned in this pa- 
per have normal trading sessions in all days of the week, 
except for Saturdays, Sundays and holidays declared in 
advance, excluding the Egyptian and Tel Aviv Stock Ex- 
change. For the latter two stock markets, trading takes 
place from Sunday to Thursday. For the alignment of the 
time series, we simply move the data on Sunday to Friday 
for these two markets. For comprehensive understand- 
ing of the cross correlation of financial markets, the bond 
rates in US are also added in our analysis, which include 
9 indices ranging from the 3 month to 20 year rates. 

We define the logarithmic price return of the i-th stock 
over a time interval At as 



Ri (t) ee In Pi (t + At) - In Pi (t) , 



(1) 



where Pj (i) represents the price of the stock price at time 
t. To ensure different stocks with an equal weight, we 
introduce the normalized price return 



n{t) 



Ri(t) - (Ri(t)) 



(2) 



where 



■) is the average over time t, and = 
\J (Rf) — (Ri) 2 denotes the standard deviation of Ri. 
Then, the elements of the cross-correlation matrix C are 
defined by the equal-time correlations 



L i 



(ri(t)rj(t)). 



(3) 



By the definition, C is a real symmetric matrix with Cu = 
1, and Cij is valued in the domain [— 1, 1]. 

The mean value Cy of the elements for the SSE is 0.37, 
much large than 0.16 and 0.26 for the NYSE and GMI re- 
spectively. It confirms that stock prices in emerging mar- 
kets are more correlated than mature ones [ToTl35ll36] . The 
correlation between financial indices in the GMI is smaller 
than that of the SSE, but bigger than that of the NYSE. 

We now compute the eigenvalues of the cross-correlation 
matrix C, in comparison with those of the so-called 
Wishart matrix, which is derived from non-correlated time 
series. Assuming N time series with length T, and in 
the large- A and largc-T limit with Q ee T/N > 1, the 
probability distribution P rm (A) of the eigenvalue A for the 
Wishart matrix is give by {3"71l3"8] 



Prm (A) 



V 7 ^ 



ran 
max 



A) (A - A™ ) 



2tt A 
with the upper and lower bounds 



\ ran 

min(max) 



i±(i/VQ 



(4) 



(5) 



For a dynamic system, large eigenvalues of the cross- 
correlation matrix, which deviate from P rm (A), imply that 



there exists non-random interactions. In fact, in both ma- 
ture and emerging stock markets, the bulk of the eigen- 
value spectrum P(A) of the cross-correlation matrix is sim- 
ilar to P rm (A) of the Wishart matrix, but some large eigen- 
values deviate significantly from the upper bound A™™^. 
This scenario looks similar for the GMI. Let us arrange 
the large eigenvalues in the order of A Q > A Q+ i. As shown 
in table [TJ the largest eigenvalue Ao of the SSE (China) is 
97.33, about 56 times as large as the upper bound A™" x of 
Prm (A), while A of the NYSE (US) and GMI is 45.61 and 
21.53, about 29 and 16 times as large as A™" x respectively. 

According to the previous works [6,16, 24 ,39! , the large 
eigenvalues deviating from the bulk correspond to differ- 
ent modes of motion in stock markets. The components in 
the eigenvector of the largest eigenvalue Ao are uniformly 
distributed. Therefore, the largest eigenvalue represents 
the market mode, which is driven by interactions common 
for stocks in the entire market. The components in the 
eigenvectors of other large eigenvalues are localized. A 
particular eigenvector is dominated by a sector of stocks, 
usually associated to a business sector. By itj (A a ), we 
denote the component of the i-th stock in the eigenvec- 
tor of A Q . To identify the sector, one may introduce a 
threshold u c , to select the dominating components in the 
eigenvector by | Ui (A a ) |> u c [5]. The threshold u c is de- 
termined by two criteria. Firstly, if the matrix is random, 
< |u(A)| >~ l/y/N for every eigenmode. Therefore, u c 
should be larger than 1 / y/N. Secondly, u c should not be 
too large, otherwise there would be not so many stocks in 
each sector. 

In this paper, we show that the components in an eigen- 
vector may carry positive and negative signs, and the com- 
ponents with opposite signs are anti-correlated within this 
eigenmode. Inspired by this observation, we investigate 
the subsector structure of the financial markets, by tak- 
ing into account the signs of the components. In other 
words, we separate a sector into two subsectors by two 
thresholds uf = ±w c : Ui (A a ) > it+ and Uj (A a ) < u~, 
which correspond to the positive and negative subsectors 
respectively. 

Subsectors. — According to reference |BJ, standard 
business sectors can hardly be detected in the SSE 
(China). Instead, one finds that there exists three unusual 
sectors, i.e., the ST, Blue-chip and SHRE sectors, corre- 
sponding to the second, third and forth largest eigenval- 
ues respectively. What are the dominating stocks for the 
eigenvectors of other large eigenvalues remains puzzling. 

In the SSE, a company will be specially treated if its 
financial situation is abnormal. Then a prefix of the 
acronym "ST" will be added to the stock ticker. The 
acronym "ST" will be removed when the financial situ- 
ations becomes normal. In reference [6], the so-called ST 
sector consists of the "ST" stocks. On the other hand, 
the Blue-chip sector is referred to those companies with 
a national reputation, and with good performance, i.e., a 
reasonable positive profit in a period of time. Meanwhile, 
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the SHRE represents the companies registered in Shanghai 
with the real estate business. 

Now we introduce two thresholds ujr = ±it c to sepa- 
rate the dominating components in an eigenvector into two 
parts, i.e., Ui (A Q ) > u+ and Ui (A a ) < u~ , which are re- 
ferred to the positive and negative subsectors respectively. 
With this method, we are able to identify the subsectors 
of the SSE up to the seventh largest eigenvalue A6, and to 
achieve deeper understanding on the unusual sectors such 
as the ST and Blue-chip sectors. The results are shown in 
table O The market mode described by the largest eigen- 
value Ao is not included in the table, where all components 
in the eigenvector posses a same sign. 

The negative components in the eigenvector of the sec- 
ond largest eigenvalues Ai are dominated by the ST stocks. 
With the threshold u~ = —0.10, for example, 23 dominat- 
ing stocks are selected, and 20 of them are the ST stocks. 
Therefore this subsector is called the ST subsector. For 
the positive components in the eigenvector of Ai, we could 
not identify a common feature for the dominating stocks. 
In fact, as the threshold u+ increases, the number of the 
dominating stocks shrinks. For example, with the thresh- 
old u~ = —0.10, there are only 7 dominating stocks, and 
half are also the ST stocks. In reference [BJ, therefore, the 
whole sector of Ai is called the ST sector. The negative 
components in the eigenvector of the third largest eigen- 
value A2 well define the high technology subsector, while 
the positive ones are dominated by the traditional indus- 
try stocks. Stocks in both subsectors are the Blue-chip 
stocks. Therefore, these two subsectors together are as- 
cribed to the Blues-chip sector in reference [6]. For the 
fourth largest eigenvalue A3, the SHRE sector detected in 
reference [BJ splits into two subsectors, i.e., the SHRE and 
ST subsectors. Consistent with the result in reference [BJ, 
half of the ST stocks are also the SHRE stocks. But the 
ST stocks here are different from those for Ai. 

In reference [6] , the sector structure is explored only up 
to A3. With the exploration of the subsector structure, we 
are able to step further. For A4, the positive and negative 
subsectors arc identified to be the weakly and strongly 
cyclical industry respectively. The former includes the 
stocks which fluctuate little with the economic cycle, such 
as the daily consumer goods and services, while the latter 
is blooming or depressing with the economic cycle, includ- 
ing the basic materials and energy resources. The positive 
components in the eigenvectors of A5 and A6 are domi- 
nated by the finance and non-daily consumer subsectors, 
although the negative ones remain unknown. 

Taking into account the signs of the components in the 
eigenvector, one may explore the subsector structure in the 
SSE up to Ag. A number of standard business subsectors 
such as the high technology and finance are also observed. 
But the SSE is indeed dominated by unusual sectors and 
subsectors such as the ST, Blue-chip, traditional indus- 
try, SHRE, weakly and strong cyclical industry. In China, 
the companies are not operated strictly within the regis- 
tered business. Therefore, standard business subsectors 



are rarely observed. From the view of the behavioral psy- 
chology, the investors in China are extraordinarily looking 
at the performance of the companies and the dominating 
business and areas, etc. Therefore, unusual sectors such 
as the ST, Blue-chip, and SHRE emerge. 

For comparison, we also apply this method to study the 
subsector structure in the NYSE (US). The results are 
listed in table [3] The subsector structure in the NYSE is 
somewhat different from that in the SSE. From general be- 
lieving, the standard business subsectors should dominate 
the eigenvectors of the large eigenvalues. Additionally, it 
would be expected that there exists only one dominating 
subsector in an eigenvector, probably under certain condi- 
tions, e.g., when the total number of stocks is sufficiently 
large. To clarify these issues, our results are presented up 
to the thresholds uf = ±0.12. As shown in tabled most 
subsectors are indeed the standard business subsectors. 
For Ai, A 2 , A 6 and An, only one dominating subsector re- 
mains for sufficiently large thresholds uf. For A3; Xi, As 
and A9, however, there are two dominating subsectors. For 
our dataset of the NYSE, our method does also provide a 
deeper understanding on the spatial structure. 

Finally, as shown in table 0] the subsectors in the GMI 
can be identified with the threshold u c = ±0.15, exclu- 
sively in terms of the areas to which the indices belong. 
Different from the SSE and NYSE, the eigenvector of the 
largest eigenvalue Ao of the GMI does not describe the 
so-called 'market mode', which represents the global mo- 
tion of the financial system. This may reflect the fact that 
all the financial markets in the world have not been in 
such a unified status. The first, second and third largest 
eigenvalues correspond to the US, Asia-Pacific and Bond 
sectors, with only a single dominating subsector. The US 
sector mainly consists of the indices in US, except for the 
GSPTSE from Canada and GDAXI from Germany. This 
result reflects that US is the dominating economy in the 
world. From A3 to A7, there emerge two dominating sub- 
sectors. One important feature of the subsector structure 
is that the indices in the mainland of China or in Hongkong 
always form an independent subsector. On the other hand, 
the US bond rates do not mix with the indices in stock 
markets. For A6, the short-term bond rates and long-term 
bond rates are separated into the positive and negative 
subsectors respectively. 

Anti-correlation between subsectors. What is 
the physical meaning of the positive and negative subsec- 
tors? The cross-correlation between two stocks can be 
written as 

N 

E A « c s> ( '>j ( 6 ) 

a=l 

where A Q is the a-th eigenvalue, uf is the i-th component 
in the eigenvector of A Q , and Cg represents the cross- 
correlation in the a-th eigenmode. In other words, the 
cross-correlation between two stocks can be decomposed 
into those from different eigenmodes. Since the eigenvalue 
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A a is always positive, it gives the weight of the a-th eigen- 
mode, and the sign of C°j is essential in the sum. Accord- 
ing to Eq. ((SJ), C"j is positive if the components uf and 
u" have the same sign in a particular eigenmode. Other- 
wise, it is negative. When C°j- is negative, two stocks are 
referred to be anti- correlated in this eigenmode: when the 
price return of the i-th stock is positive, the price return 
of the j-th stock tends to be negative in the statistical 
sense. Therefore, all stocks in a same subsector are pos- 
itively correlated in this eigenmode, while the stocks in 
different subsectors are anti-correlated. This is the phys- 
ical meaning of the subsectors. For the NYSE, however, 
only a number of sectors split into two subsectors. This 
suggests that the spatial structure and interactions among 
the stocks in the SSE are more complicated. 

Let us examine some examples in the SSE. The sector 
of A2 is composed of the traditional industry and high 
technology subsectors. The former represents those tradi- 
tional industry companies with a long-term and stable in- 
terest, but a lower asset risk and expected revenue, while 
the latter includes the high technology companies with 
novel business and conceptions, but a higher asset risk 
and potential profit. In a particular period, for exam- 
ple, the stock market is uncertain, and investors prefer 
the traditional industries with a lower risk, then their 
stock prices rise up higher than those of the high tech- 
nology companies. In another period, however, the stock 
market is booming, and the situation is reverse. Thus, 
these two subsectors are anti-correlated in the eigenmode 
of A2. The sector of A4 consists of the weakly and strongly 
cyclical industry subsectors. Both subsectors are unusual, 
but their anti-correlation seems obvious. The weakly 
and strongly cyclical industries are weakly and strongly 
correlated with the macro-economy environment respec- 
tively. Thus, investors prefer the strongly cyclical indus- 
try when the macro-economy is booming. Instead, in- 
vestors rather choose the weakly cyclical industry when 
the macro-economy declines. In reference [BJ, the sector of 
A3 is identified as the SHRE sector. Now this sector splits 
into the ST and SHRE subsectors. In fact, half of the 
ST stocks also belongs to the SHRE stocks. This suggests 
that the investors care much the normal and abnormal 
financial situation, even for the SHRE companies. 

In the NYSE, the subsector structure of A3, A7 and Ag 
is understandable. The daily consumer goods and services 
are considered as the traditional industries, while the high 
technology and finance belong to another category. These 
two sorts of stocks may show an anti-correlation, consis- 
tent with the subsector structure of A2 in the SSE. For A2 
with the threshold uf = 0.08, a weak subsector structure 
is observed in the NYSE. From their intrinsic properties, 
the daily consumer goods and basic materials are classified 
as the weakly and strongly cyclical industries respectively. 
This is similar to the case of A4 in the SSE. For As, the 
subsectors may be also explained along the lines above. 

In the GMI, two examples are typical. The first one is 
the subsectors of A5, where all components except for one 



are the indices in the American stock markets, one subsec- 
tor is composed of the IIX, IXIC, NDX, NWX, PSE and 
SOXX. Most of these indices are related to the information 
technology, semiconductor industry, internet industry, etc, 
with a potentially high payoff and asset risk. The other 
subsector consists of the XMI, DJA, DJI, DJU and DJX. 
Most of them arc for the weighted and traditional compa- 
nies, which share the general feature of a stable currency 
flow and mature business mode, but a lower profit. These 
two subsectors are anti-correlated in this eigenmode. The 
second example is the subsectors of Ag, which are obvi- 
ously anti-correlated for they are just short-term and long- 
term bond rates in US. For A3, A4 and A7, the subsector 
structure indicates that the stock markets in China are 
somewhat special. 

To quantitatively measure the anti-correlation between 
the positive and negative subsectors, we construct the 
combinations of stocks in the two subsectors, J„ (t) = 
uf(a)ri(t), and compute the cross-correlation 

C + .(a) = (I+(t)I-(t)). (7) 

Here uf(a) is the i-th positive or negative component 
in the a-th eigenmode selected by the threshold, e.g., 

uf = ±0.08. In figure [TJ the cross-correlation C-| (a) 

is shown for the SSE and NYSE, in comparison with that 
between two random combinations of stocks. This result 
is not qualitatively sensitive to whether one introduces the 
thresholds uf to select the dominating components. 

In figure [TJ we observe that C_| (a) monotonically in- 
creases, and gradually approaches that for two random 

combinations of stocks. C_| (a) computed with I^(t) is 

smaller than that with two random combinations of stocks 
because of the anti-correlation between the positive and 
negative subsectors. 

What matrix structure results in the subsector struc- 
ture? Let us consider a 4 x 4 cross-correlation matrix, 

/ 1 0.55 0.15 0.11 \ 

_ 0.55 1 0.39 0.34 

C4x4 ~ 0.15 0.39 1 0.95 ' (8j 

\ 0.11 0.34 0.95 1 J 

which is taken from the Ag sector of the GMI. The 1-th 
and 2-th indices represent the 3-month and 6-month bond 
rates respectively, identified as the positive subsector. The 
3-th and 4-th indices are the 10-year and 20-year bond 
rates, identified as the negative subsector. Obviously, the 
matrix elements Cy within the same subsectors, i.e., in 
the diagonal blocks, are larger than the ones between the 
positive and negative subsectors, i.e., in the off-diagonal 
blocks. 

To verify the anti-correlation more intuitively, therefore, 
we may calculate the average Cij within the positive or 
negative subsector, and between the positive and nega- 
tive subsectors. The results for the NYSE are shown in 
figure HJ and those for the SSE are similar. The aver- 
age Cij within the positive or negative subsector is obvi- 
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ously much large than that between the positive and neg- 
ative subsectors, especially for small a, i.e., large eigen- 
values. This strongly suggests that there indeed exists 
an anti-correlation between the positive and negative sub- 
sectors. However, we should keep in mind that the anti- 
correlation in a particular eigenmode is only a part of the 
cross-correlation between two stocks, as shown in Eq ([5]). 
How to make use of this anti-correlation theoretically and 
practically remains challenging. 

Conclusion. — With the RMT method, we have in- 
vestigated the spatial structure of the SSE (China), NYSE 
(US) and GMI. Taking into account the signs of the com- 
ponents in the eigenvectors of the cross-correlation ma- 
trix, a sector may split into two subsectors, which are 
anti-corrclatcd each other in the corresponding eigenmode. 
The results are shown in table HJ [3] and S The NYSE 
is dominated by the standard business sectors and subsec- 
tors, and the GMI is controlled by the area sectors and 
subsectors, but without the market mode. In contrast to 
it, the SSE exhibits unusual sectors and subsectors. 

The subsector structure is strong in the SSE, while 
somewhat weaker in the NYSE and GMI. The anti- 
correlation between the positive and negative subsectors 

in an eigenmode can be measured by C_| (a) in Eq. ([7]) 

and the average Cy within the positive or negative sub- 
sector, and between the positive and negative subsectors, 
as shown in figures [1] and 
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Table 2: The subsectors in the SSE. The fraction is the number of well identified stocks over the total number of stocks in the 
subsector. Null: no obvious category; ST: specially treated; Trad: traditional industry; Tech: high technology; SHRE: Shanghai 
real estate; Weak: weakly cyclical industry; Stro: strongly cyclical industry; Fin: finance; IG: industrial goods; Util: utility; 
Basic: basic materials; Heal: health care; CG: daily consumer goods; Serv: services. 
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Table 3: The subsectors in the NYSE. The abbreviations can be seen in the caption of table [2] 
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Table 1: A^w m(M ) denote the lower (upper) bound of the 
eigenvalues of the Wishart matrix, while X^in, Ao, Ai and A2 
represents the lower bound of the eigenvalues and the three 
largest eigenvalues of the real systems respectively. 
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Table 4: The subsector structure in the GMI . The thresholds 
are u^r = ±0.15. The bold Italic items are those not belonging 
to the areas. Nor A refers to the North America, and b3m and 
bly are the 3 month and 1 year bonds. 
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Fig. 1: C+-(a) for the SSE and NYSE are compared with that 
between two random combinations of stocks. 
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Fig. 2: The average cross-correlation dj for the NYSE. 
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